Presentation 


Agenda 


Deployment  of  SAR  and  GMTI  Signal  Processing  on  a  Boeing  707  Aircraft  using 

pMatlab  and  a  Bladed  Linux  Cluster 


Jeremy  Kepner,  Tim  Currie,  Hahn  Kim,  Andrew  McCabe,  Bipin  Mathew,  Michael  Moore, 
Dan  Rabinkin,  Albert  Reuther,  Andrew  Rhoades,  Nadya  Travinin,  and  Lou  Telia 

MIT  Lincoln  Laboratory 
Phone:  781-981-3108 
Email  Addresses: 

{kepner,  currie,  hgk,amccabe,matthewb,moore,  rabinkin,  reuther,  rhoades,nt,tella}  @11.  mit.edu 


Abstract 

The  Lincoln  Multifunction  Intelligence,  Surveillance  and  Reconnaissance  Testbed  (LiMIT)  is  an 
airborne  research  laboratory  for  development,  testing,  and  evaluation  of  sensors  and  processing 
algorithms.  During  flight  tests  it  is  desirable  to  process  the  sensor  data  to  validate  the  sensors  and 
to  provide  targets  and  images  for  use  in  other  on  board  applications.  Matlab  is  used  for  this 
processing  because  of  the  rapidly  changing  nature  of  the  algorithms,  but  requires  hours  to 
process  the  required  data  on  a  single  workstation.  The  pMatlab  and  MatlabMPI  libraries  allow 
these  algorithms  to  be  parallelized  quickly  without  porting  the  code  to  a  new  language.  The 
availability  of  inexpensive  bladed  Linux  clusters  provides  the  necessary  parallel  hardware  in  a 
reasonable  form  factor.  We  have  integrated  pMatlab  and  a  28  processor  IBM  Blade  system  to 
implement  Ground  Moving  Target  Indicator  (GMTI)  processing  and  Synthetic  Aperture  Radar 
(SAR)  processing  on  board  the  LiMIT  Boeing  707  aircraft.  GMTI  processing  uses  a  simple 
round  robin  approach  and  is  able  to  achieve  a  speedup  of  18x.  SAR  processing  uses  a  more 
complex  data  parallel  approach,  which  involves  multiple  "comer  turns"  and  is  able  to  achieve  a 
speedup  of  12x.  In  each  case,  the  required  detections  and  images  are  produced  in  under  five 
minutes  (as  opposed  to  one  hour),  which  is  sufficient  for  in  flight  action  to  be  taken. 

1.  Introduction 

Airborne  sensor  research  platforms  traditionally  record  data  in  the  air  and  process  it  later  on  the 
ground.  On  board  processing  has  been  prohibited  because  of  rapidly  changing  algorithms,  the 
cost  of  parallel  processing  hardware,  and  the  time  to  implement  the  algorithms  in  a  real-time 
programming  environment.  This  situation  has  changed  with  the  advent  of  several  new 
technologies:  parallel  Matlab  (e.g.  pMatlab  and  MatlabMPI),  inexpensive  bladed  Linux  clusters, 
high-speed  disk  recording  systems,  and  on  board  high  bandwidth  networks.  Integrating  these 
technologies  on  board  the  aircraft  (Figure  1)  allows  processing  in  a  sufficiently  rapid  manner  for 
in  flight  action  to  be  taken.  This  talk  presents  the  overall  architecture  for  such  a  system  as 
demonstrated  on  the  Lincoln  Multifunction  Intelligence,  Surveillance  and  Reconnaissance 
Testbed  (LiMIT). 

2.  Approach 

The  LiMIT  signal  processor  goal  is  to  provide  in  flight  assessment  of  the  overall  performance  of 
the  radar  system,  and  to  provide  targets  and  images  for  use  in  other  on  board  applications.  Four 
technologies  are  the  foundation  of  the  LiMIT  on  board  processing  system:  parallel  Matlab  (e.g. 
pMatlab  and  MatlabMPI),  inexpensive  bladed  Linux  clusters,  high-speed  disk  recording  systems, 
and  an  on  board  high  bandwidth  network.  The  pMatlab  parallel  Matlab  toolbox  implements 
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Global  Array  Semantics  in  the  Matlab  environment,  which  provides  parallel  data  abstractions 
that  allow  the  analyst  to  write  parallel  code  with  minor  modifications  to  their  serial  code. 
pMatlab  is  built  on  top  of  the  MatlabMPI  point-to-point  communications  library.  The  14  node 
28  CPU  bladed  Linux  cluster  provides  inexpensive  parallel  processing,  memory,  local  storage 
and  local  interconnect,  in  a  7U  form  factor,  that  supports  Matlab  and  all  its  libraries.  The  disk 
based  recording  system  can  be  mounted  via  a  conventional  network,  providing  a  simple  file 
system  between  the  recording  system  and  the  signal  processor.  A  rich  conventional  LAN  based 
interconnect  allows  the  signal  processor  to  use  standard  COTS  based  communication  protocols 
for  reading  the  record  system  (e.g.  NFS,  FTP,  ...),  sending  displays  back  to  the  operator  (e.g.  X- 
windows),  and  sending  output  products  to  the  rest  of  the  system. 

3.  Results 

The  above  four  technologies  were  used  to  implement  Ground  Moving  Target  Indicator  (GMTI) 
and  Synthetic  Aperture  Radar  (SAR)  processing  on  board  the  aircraft.  The  speedup  as  a  function 
of  number  of  processors  is  shown  in  Figure  2.  GMTI  processing  uses  a  simple  round  robin 
approach  and  is  able  to  achieve  a  speedup  of  ~18x.  SAR  processing  uses  a  more  complex  data 
parallel  approach  which  involving  multiple  "comer  turns"  and  is  able  to  achieve  a  speedup  of 
~12x.  In  each  case,  the  required  detections  and  images  are  produced  in  under  five,  which  is 
sufficient  for  in  flight  action  to  be  taken.  Using  parallel  Matlab  on  a  cluster  allows  this  capability 
to  be  deployed  at  lower  cost  in  terms  of  hardware  and  software  when  compared  to  traditional 
approaches. 


Figure  1:  LiMIT  Signal  Processing  Architecture. 
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Figure  2:  GMTI  and  SAR  parallel  processing  performance. 
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Lincoln  Multifunction  Intelligence,  Surveillance  and 
Reconnaissance  Testbed 

-  Boeing  707  aircraft 

-  Fully  equipped  with  sensors  and  networking 

-  Airborne  research  laboratory  for  development,  testing,  and 
evaluation  of  sensors  and  processing  algorithms 

Employs  Standard  Processing  Model  for  Research  Platform 

-  Collect  in  the  air/process  on  the  ground 
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*  Can  we  process  radar  data  (SAR  &  GMTI)  in  flight  and 
provide  feedback  on  sensor  performance  in  flight? 


*  Requirements  and  Enablers 

-  Record  and  playback  data 

High  speed  RAID  disk  system 

-  High  speed  network 


-  High  density  parallel  computing 

Ruggedized  bladed  Linux  cluster 

-  Rapid  algorithm  development 

pMatlab 


14x2  CPU 
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Goals 

•  Matlab  speedup  through 
transparent  parallelism 

•  Near-real-time  rapid 
prototyping 


Lab-Wide  Usage 

•  Ballistic  Missile  Defense 

•  Laser  Propagation  Simulation 

•  Hyperspectral  Imaging 

•  Passive  Sonar 

•  Airborne  Ground  Moving 
Target  Indicator  (GMTI) 

•  Airborne  Synthetic  Aperture 
Radar  (SAR) 
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Concept  of  Operations 


r/E 


Record  Streaming  Data  •— 

~1  seconds  =  1  dwell 

Timeline 

Copy  to  Bladed  Cluster 

Process  on  Bladed  Cluster 

~30  Seconds 

1st  CPI  -  1  minutes 

2  Dwell  ~2  minutes 

Process  on  SGI 

1st  CPI  -  2  minutes 

2  Dwells  ~1  hour 

RAID  Disk 
Recorder 


600  MB/s 


Gbit  Ethernet 


Bladed  Cluster 
Running  pMatlab 

-% 


To  Other 
Systems 

Xwindows 
over  Lan 


Copy  w/rcp  (I  TB  local  storage  ~  20  min  data) 


(lx  RT) 


Streaming 
Sensor  Data 


SAR 
GMTI 
■  ■  ■ 

(new) 


•  Net  benefit:  2  Dwells  in  2  minutes  vs.  1  hour 
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*  Tested  only  at  operational  (i.e. 
in-flight)  levels: 

-  OdB  =  1.4G  (above  normal) 

-  -3dB  =  -1.0G  (normal) 

-  -6dB  =  ~0.7G  (below  normal) 

*  Tested  in  all  3  dimensions 

*  Ran  MatlabMPI  file  based 
communication  test  up  14 
CPUs/14  Hard  drives 

*  Throughput  decreases  seen  at 
1.4  G 


X-axis,  13  CPU/13  HD 


Message  Sizes  (Bytes) 
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•  Temperature  ranges 

-  Test  range:  -20°C  to  40°C 

-  Bladecenter  spec:  10°C  to  35°C 

•  Cooling  tests 

-  Successfully  cooled  to  -10°C 

-  Failed  at  -20°C 

-  Cargo  bay  typically  >  0°C 

•  Heating  tests 

-  Used  duct  to  draw  outside  air  to 
cool  cluster  inside  oven 

-  Successfully  heated  to  40°C 

-  Outside  air  cooled  cluster  to  36°C 
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*  IBM  Bladecenter  is  not  designed 
for  707’s  operational 
environment 

*  Strategies  to  minimize  risk  of 
damage: 

1.  Power  down  during  takeoff/ 
landing 

•  Avoids  damage  to  hard  drives 

•  Radar  is  also  powered  down 

2.  Construct  duct  to  draw  cabin  air 
into  cluster 

•  Stabilizes  cluster  temperature 

•  Prevents  condensation  of  cabin  air  moisture  within  cluster 
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SGI  RAID  System 

Scan  catalog  files,  select  dwells  and 
CPIs  to  process  (C/C  shell) 

Assign  dwells/CPIs  to  nodes,  package 
up  signature  /  aux  data,  one  CPI  per 
file.  Transfer  data  from  SGI  to  each 
processor’s  disk  (Matlab) 

IBM  Bladed  Cluster 

Nodes  process  CPIs  in  parallel,  write 
results  onto  node  1’s  disk.  Node  1 
processor  performs  final 
processing 

Results  displayed  locally 
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*  pMatlab  allows  integration  to  occur  while  algorithm  is  being  finalized 
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MatlabMPI  &  pMatlab  Software  Layers 
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•  Can  build  a  parallel  library  with  a 

*  Can  build  applications  with  a  few 

few  messaging  primitives 

parallel  structures  and  functions 

•  MatlabMPI  provides  this 

•  pMatlab  provides  parallel  arrays 

messaging  capability: 

and  functions 

X  -  ones  (n,  mapX) ; 

MPI  Send(dest,comm,tag#X); 

Y  =  zeros  (n,  mapY) ; 

X  =  MPI  Recv(source/com  m,tag); 

Y(:,:)  =  fBt(X); 
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GMTI  Block  Diagram 


Parallel  Implementation 
Approach 

Deal  out  CPIs  to  different  CPUs 
Performance 

TIME/NODE/CPI  -100  sec 

TIME  FOR  ALL  28  CPIS  -200  sec 
Speedup  -14x 


•  Demonstrates  pMatlab  in  a  large  multi-stage  application 

-  -13,000  lines  of  Matlab  code 

*  Driving  new  pMatlab  features 

-  Parallel  sparse  matrices  for  targets  (dynamic  data  sizes) 

Potential  enabler  for  a  whole  new  class  of  parallel  algorithms 
Applying  to  DARPA  HPCS  GraphTheory  and  NSA  benchmarks 

-  Mapping  functions  for  system  integration 

-  Needs  expert  components! 
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GMTI  pMatlab  Implementation 


•  GMTI  pMatlab  code  fragment 


%  Create  distribution  spec:  b  =  block,  c  =  cyclic. 
dist_spec(l).dist  =  'b'; 
dist_spec(2).dist  =  'c'; 

%  Create  Parallel  Map. 

p  Map  =  map([lMAP  PIN  G.  Ncpus]  ,dist_spec,  0 :  MAP  PIN  G.  Ncpus-1) ; 
%  Get  local  indices. 

Dind.  dim  _l_ind  land.  dim_2_ind]  =  gLobal_ind(zeros(l,C*D,pMap)); 

%  loop  over  local  part 

far  index  =  l:length(lind.  dim  2  ind) 

•  •  • 

end 

*  pMatlab  primarily  used  for  determining  which  CPIs  to  work  on 
-  CPIs  dealt  out  using  a  cyclic  distribution 
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*  Most  complex  pMatlab  application  built  (at  that  time) 

-  -4000  lines  of  Matlab  code 

-  CornerTurns  of  ~1  GByte  data  cubes 

*  Drove  new  pMatlab  features 

-  Improving  Corner  turn  performance 

Working  with  Mathworks  to  improve 

-  Selection  of  submatrices 

Will  be  a  key  enabler  for  parallel  linear  algebra  (LU,  QR,  ...) 

-  Large  memory  footprint  applications 

Can  the  file  system  be  used  more  effectively 
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SAR  pMatlab  code  fragment 


%  Create  Parallel  Maps. 

mapA  =  map([l  Ncpus]  ,0:  Ncpus- 1); 

mapB  =  map  ([Ncpus  l],0:Ncpus-l); 

%  Prepare  distributed  Matrices. 
fd_midc= zeros  (m  w,  Totalnum  Pulses,  mapA) ; 
fd_midr=zeros(m  w,  Totalnum  Pulses,  mapB) ; 

%  Corner  Turn  (columns  to  rows) . 
fd  midr(:,:)  =  fd  midc; 


Cornerturn  Communication  performed  by  overloaded  ‘=‘  operator 

-  Determines  which  pieces  of  matrix  belongs  where 

-  Executes  appropriate  MatlabMPI  send  commands 
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SAR  Parallel  Performance 


Corner  Turn  bandwidth 


*  Application  memory  requirements  too  large  for  1  CPU 

*  pMatlab  a  requirement  for  this  application 

*  Corner  Turn  performance  is  limiting  factor 

*  Optimization  efforts  have  improved  time  by  30% 

*  Believe  additional  improvement  is  possible 
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*  Final  Integration 

-  Debug  pMatlab  on  plane 

-  Working  ~1  week  before  mission  (~1  week  after  first  flight) 

-  Development  occurred  during  mission 


Flight  Plan 

-  Two  data  collection  flights 

-  Flew  a  50  km  diameter  box 

-  Six  GPS-instrumented  vehicles 

Two  2.5T  trucks 
Two  CUCV's 
Two  M577's 
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*  GMTI  successfully  run  on  707  in  flight 

-  Target  reports 

Range  Doppler  images 

*  Plans  to  use  QuickLook  for  streaming 
processing  in  October  mission 
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Embedded  Computing  Alternatives 


Embedded  Computer  Systems 

-  Designed  for  embedded  signal  processing 

-  Advantages 

1.  Rugged  -  Certified  Mil  Spec 

2.  Lab  has  in-house  experience 

-  Disadvantage 

1.  Proprietary  OS  =>  No  Matlab 

Octave 


-  Matlab  “clone” 

-  Advantage 

1.  MatlabMPI  demonstrated  using  Octave 
on  SKY  computer  hardware 

-  Disadvantages 

1.  Less  functionality 

2.  Slower? 

3.  No  object-oriented  support  =>  No 
pMatlab  support  =>  Greater  coding  effort 
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*  pMapper:  automatically  finds  best  parallel  mapping 


*  pOoc:  allows  disk  to  be  used  as  memory 


|  pMatlab  (N  x  GByte)  j  Petascale  pMatlab  (N  x  TByte) 

1  -1  GByte  _  -  *  -1  GByte  1 

RAM  1  RAM  9  9  •  RAM  1 
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•  •  • 

•  •  • 
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1  1 
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pMex:  allows  use  of  optimized  parallel  libraries  (e.g.  PVL) 


pMatlab  User  Interface 


Matlab*P  Client/Server 


Parallel  Libraries: 

PVL,  ||VSIPL++,  ScaLapack 


pMex 
dmat/ddens 
translator 


pMatlab  Toolbox 
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Airborne  research  platforms  typically  collect  and  process 
data  later 

pMatlab,  bladed  clusters  and  high  speed  disks  enable 
parallel  processing  in  the  air 

-  Reduces  execution  time  from  hours  to  minutes 

-  Uses  rapid  prototyping  environment  required  for  research 

Successfully  demonstrated  in  LiMIT  Boeing  707 

-  First  ever  in  flight  use  of  bladed  clusters  or  parallel  Matlab 

Planned  for  continued  use 

-  Real  Time  streaming  of  GMTI  to  other  assets 

Drives  new  requirements  for  pMatlab 

-  Expert  mapping 

-  Parallel  Out-of-Core 

-  pmex 

MIT  Lincoln  Laboratory 


